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Abstract Wc consider a class of optimization problems for sparse signal re- 
construction which arise in the held of Compressed Sensing (CS). A plethora of 
approaches and solvers exist for such problems, for example GPSR, FPC_AS, 
SPGL1, NestA, 4_4, PDCO to mention a few. 

Compressed Sensing applications lead to very well conditioned optimiza- 
tion problems and therefore can be solved easily by simple first-order meth- 
ods. Interior point methods (IPMs) rely on the Newton method hence they 
use the second-order information. They have numerous advantageous features 
and one clear drawback: being the second-order approach they need to solve 
linear equations and this operation has (in the general dense case) an 0(n 3 ) 
computational complexity. Attempts have been made to specialize IPMs to 
sparse reconstruction problems and they have led to interesting developments 
implemented in i\JL s and PDCO softwares. We go a few steps further. First, 
we use the matrix-free interior point method, an approach which redesigns 
IPM to avoid the need to explicitly formulate (and store) the Newton equa- 
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tion systems. Secondly, we exploit the special features of the signal processing 
matrices within the matrix-free IPM. Two such features are of particular in- 
terest: an excellent conditioning of these matrices and the ability to perform 
inexpensive (low complexity) matrix-vector multiplications with them. 

Computational experience with large scale one-dimensional signals con- 
firms that the new approach is efficient and compares favorably with other 
state-of-the-art solvers. 

Keywords Matrix- free Interior Point, Preconditioned Conjugate Gradient, 
Compressed Sensing, Compressive Sampling, £i-regularization. 
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1 Introduction 

We are concerned with the solution of the incomplete system of linear equa- 
tions 

Ax = b, (1) 

where A £ IR mx ", x £ M. n , b £ M. m and m < n. In particular, we are interested 
in the solution x with the smallest possible number of nonzero elements, oth- 
erwise known as the sparsest solution x. Such problems arise in the fields of 
Statistics [25J, [23], [31] and Signal processing [5], [8], [3D], [TJ, [5J. 

The sparsest solution x of system ([!]) can be found by solving the following 
program: 

min ||a;||o 

xgl" ^ (2) 

subject to: Ax = b, 

where x £ R n , A £ R mxn , b £ W' n and ||x|| Q = {# of nonzero entries in x}. 
The use of zero-norm makes the problem combinatorial and untractable in 
practice. Recent advances in the field of Compressed Sensing show that in 
certain situations (|11|. [BJ) exact recovery of the sparsest solution x of can 
be achieved with an overwhelming probability by solving the following Basis 
Pursuit [TDJ problem: 



BP: xe 



mm \\x\\i 

(3) 



subject to: Ax = 6, 

where x £ E™ , A £ R mxn , b £ W n and ||a:||i = £™ =1 \x t \. The problem ^ 
has a major advantage over Unlike the zero- norm objective in the t\- 
norm objective in Q can be reformulated as a linear function and therefore the 
problem (|3| may be recast as a linear program and becomes computationally 
tractable. Having a linear reformulation of ([3]), standard efficient optimization 
methods can be used to recover the sparsest solution x. 
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In real-life applications the right hand side of ([!]) is often corrupted with 
noise and ([T]) is replaced with: 



Ax 



(4) 



where e € R m denotes the error: we assume it has a normal distribution 
ei ~ A/"(0, er 2 ), Vi = 1, m. For the noisy case Q the sparsest solution x can 



be found by solving one of the following programs 
BPDN: 



min r||x||i + \\Ax — 6| 1 1 



LASSO: 



\\Ax-b\\ 2 
subject to: \\x\U < t\ 



mm 



(5a) 
(5b) 



BP, 



mm 



Pill 



subject to: \\Ax — b\\2 < £2 



(5c) 



where r, t\ and 62 are positive scalars that regulate the sparsity and the upper 



bound on the noise error, respectively. Program (5a) is the well-known Basis 
Pursuit Denoising problem introduced in |10j . program (5b I is the Least Ab- 



solute Shrinkage and Selection Operator (LASSO) used frequently in the field 
of computational statistics [29]. It can be shown using Theorem 27.4 from [26] 
that the formulations in ([5]) are equivalent for specific values of scalars r, ei 
and £2- 

Practical problems have large dimensions and of-the-shelf approaches such 
as the simplex method or the (standard) interior point method are often im- 
practical. However, matrices A that appear in compressed sensing problems 
display several attractive features which may be exploited within an opti- 
mization algorithm. This has created an interest in developing specialized ap- 
proaches to solving such problems. 

There have been various first-order methods developed for the solution of 
([3]) and ([5]). Let us mention the ones known to be the most efficient. 

Gradient Projection Sparse Reconstruction GPSR [17] solves program 
At each step of the algorithm a line search is performed along the neg- 



(5a 



ative gradient direction and the new iterate is projected to the nonnegative 
orthant. 

Fixed Point Continuation Active Set FPC_AS [3T], [32] solves program 
(5a). FPC_AS is a two stage algorithm. At the first stage a shrinkage scheme 



is employed which aims to spot quickly the nonzero components of the 
sparse representation. Then the second stage is enabled to solve a smooth 



version of ( 5a ) limited to the indexes of nonzero components found by the 



first stage of the algorithm. 

Spectral Projected Gradient SPGL1 [3J solves any of the programs (J3j , 



(5b) and (5c). The SPGL1 is a spectral projection gradient algorithm which 
iteratively solves (5b) for some values of t\ in order to get a solution of 
(5c). 
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— Nest A j2] solves program ( [5c] ) by using a variant of the Nesterov's smooth- 
ing gradient algorithm [24j. which has been proved to have the optimal 
bound O(^) on the number of iterations, where e is the required accuracy. 

Independently there have been several attempts to design suitable interior 
point method (IPM) implementations. The most efficient among them, which 
can also handle large scale CS problems, are listed below. 



— li_l s algorithm [22 solves a reformulation of program (5a) which allows 
a straightforward preconditioning of the Newton equation system that is 
solved with a conjugate gradient method. 

— PDCO algorithm [55] solves regularized versions of programs ^ and (|5a|). 
The Newton equation system is solved by applying an LSQR method |25j . 

Both li-£ s and PDCO have been demonstrated to be robust in comparison 
with other interior point method implementations. However, they are not as 
accurate and as fast as state-of-the-art first-order methods. 

In this paper we present a new interior point method specialized to com- 
pressed sensing problems. First, our IPM is matrix-free [20] . i.e. the explicit 
problem formulation is avoided and the measurement matrix A is used as an 
operator to produce results of matrix-vector products Ax and A T y. We rely 
on the fact that for many measurement matrices that appear in sparse signal 
reconstruction problems there are super-fast (e.g. 0(n) or O(nlogn) com- 
plexity) algorithms of multiplication by a vector. Second, Newton direction 
at each step of the IPM is found approximately as a solution of the corre- 
sponding normal equation obtained by the preconditioned conjugate gradient 
algorithm. We propose a very efficient preconditioner that is based on the fact 
that sub-matrices of A with a given number of columns are uniformly well- 
conditioned (this is called the Restricted Isometry Property, see the discussion 
in Section [2]). The objective of our developments is to design an IPM which 
preserves the main advantage of IPM, that is, it converges in merely a few 
iterations, and removes the main drawback of IPM, that is, avoids expensive 
computations of the Newton direction. Ideally, we would like to solve the com- 
pressed sensing problems in O(logn) IPM iterations and keep the cost of a 
single IPM iteration as low as possible and not exceeding O(nlogn). 

The paper is organized as follows. In Section [2] we discuss the particular 
features of compressed sensing matrices that are exploited in our approach. In 
Section [3] we reformulate sparse recovery optimization problems ^ and ( |5a[) 
to make them suitable for the matrix-free interior point method. Section H] 
concerns finding approximate Newton directions required at each step of the 
IPM. We give two alternative normal equations systems formulations of the 
above stated problem an analyze their properties. For one of the systems we 
propose an efficient preconditioner that can be used in the preconditioned 
conjugate gradient method. We prove that under certain conditions (that are 
satisfied in practice) eigenvalues of the preconditioned matrix are well clustered 
around 1. In Section [5] we compare the proposed matrix- free IPM with other 
state-of-the-art first and second order solvers. We assess the performance of the 
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matrix-free IPM in terms of speed and quality of reconstruction on problems 
from the Sparco test suite |4j. 



2 Properties of Compressed Sensing Matrices 

Matrices which appear in sparse reconstruction problems originate from dif- 
ferent bases in which signals are represented. What they all have in common 
are the conditions that guarantee recoverability of the sparsest solution of 
by means of the ^i-norm minimization The restricted isometry property 
(RIP) [51IT5] is one of such conditions which shows how efficiently a measure- 
ment matrix captures information about sparse signals. 

Definition 1 The restricted isometry constant 5k of a matrix A £ flj™ 1 *™ j s 
defined as the smallest 5k such that 

(i-« fc )Wl<ll^lli<(i + ^)INl2 (6) 

for all /c-sparse x £ K ra . 

In words, statement ([6| requires that all column sub-matrices of A with at 
most k columns be well-conditioned. Informally, A is said to satisfy the RIP if 
5k is small for a reasonably large k. The next theorem due to [18] establishes 
the relation between the RIP property and the sparse recovery. 

Theorem 1 Every k- sparse vector x £ K.™ satisfying Ax — b is the unique 
solution of ([3]) if 

5 2k < — Pa 0.4652. 
4 + V6 

The restricted isometry property also implies stable recovery by ^i-norm 
minimization for vectors that can be well approximated by sparse ones, and it 
further implies robustness under noise on the measurements [5]. 

RIP is a very restrictive condition that depends on the size of the mea- 
surement matrix A. Clearly, the more columns n matrix A has (the larger 
the size of the vector x to recover) the larger §k in ^ is (the harder it is 
to guarantee sparse recovery). On the other hand, number of rows m of A 
is the number of measurements taken and, hence, the RIP constant Sk de- 
creases with m. Currently known measurement matrices satisfying RIP with 
small number of measurements fall into two categories |27j : (i) random ma- 
trices with i.i.d. sub-Gaussian variables, e.g., normalized i.i.d. Gaussian or 
Bernoulli matrices; (ii) random partial bounded orthogonal matrices obtained 
by choosing m rows uniformly at random from a normalized n X n Fourier 
or Walsh-Hadamard transform matrices. Number of measurements required 
to satisfy the RIP property for both classes of matrices is given in the table 
below. 

Although it follows from the table that Gaussian matrices are optimal for 
sparse recovery, they have limited use in practice because many applications 
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m X n measurement matrix RIP regime references 

Gaussian m > Cklogn |27l Tl1 

partial Fourier m>Cklog 4 n I27| 

Table 1 List of measurement matrices that have been proven to satisfy RIP 



impose structure on the matrix. Furthermore, recovery algorithms are signif- 
icantly more efficient when the matrix admits a fast matrix- vector product. 
In the rest of the paper we assume that matrix A is not stored explicitly and 
that matrix- vector product with A is cheap (e.g. O(nlogn) or 0(n)). How- 
ever, our results are equally valid for dense random matrices such as Gaussian 
or Bernoulli. 

An important broad class of compressed sensing matrices comes from ran- 
dom sampling in bounded orthonormal systems. Partial Fourier matrix men- 
tioned earlier is just one example of this type. Other examples are matrices 
related to systems of real trigonometric polynomials (partial discrete cosine 
(DCT) and discrete sine (DST) matrices), Haar wavelets and noiselets. Quite 
often in applications a signal is sparse with respect to a basis different from 
the one in which measurements are made. Then it is said that a measure- 
ment/sparsity pair is given 6,11. Assume that a vector z is sparse with respect 
to the basis of columns of a unitary matrix (sparsity matrix), i.e. z = tyx 
for a fc-sparse vector x. Further, assume that z is sampled with respect to the 
basis of columns of a unitary matrix $ (measurement matrix): y — R m & z, 
where R m is a random sampling operator. Hence, matrix A in ([I]) is equal to 
R m <P T 'l r and its rows are orthonormal: 

AA T = I m . (7) 

The recoverability property of matrix A depends on the value of the so-called 
mutual coherence /i(<£, <?") of the measurement /sparsity pair (see [12]): 

= y/nmax.\(^ipj) \. (8) 

Coherence simply measures the largest correlation between any two elements 
of <P and W. Next theorem due to [5] shows that the smaller the value of mutual 
coherence the better the recoverability property of matrix A. 

Theorem 2 Fix z G R" and suppose that the coefficient sequence x of z in 
the basis \P is k-sparse. Select m measurements in the <P domain uniformly at 
random. Then if 

m > Ckfj,($,&f logn (9) 

for some positive constant C , then with overwhelming probability the vector x 
is the unique solution to the l\-minimization problem ^ with A = i?„ i <P T !Z /r . 

Let us note that condition ^ differs from those given in Table [T] Condi- 
tions in Table [l] ensure that once the random matrix is chosen, then with high 
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probability all sparse signals can be recovered (uniform recovery). Although, 
([9]) only guarantees that each fixed sparse signal can be recovered with high 
probability using a random draw of the matrix (nonuniform recovery). 

To conclude, compressed sensing matrices have many useful properties that 
must be taken into account in the development of an efficient matrix-free IPM 
solver. In the current paper we make use only of the most general of them 
that are satisfied by every compressed sensing matrix. First, we weaken a 
little bit the condition of orthonormality Q to include random matrices such 
as Gaussian and Bernoulli: 

PI: Rows of matrix A are close to orthonormal, i.e. there exists a small 5 such 
that 

\\AA T - I m \\ 2 < 6. (10) 

Restricted isometry property ^ on the contrary assumes that columns of 
A are normalized. So, our interpretation of the RIP property that will be used 
throughout the paper is as follows. 

P2: Every k columns of A with k <C m are almost orthogonal and have similar 
norms, i.e. for every matrix B composed of arbitrary k columns of A 



-B'B-I k 
m 



< 4- (11) 

2 



3 Primal Dual formulations in Matrix-free IPM 

Non-smooth Basis Pursuit ^ and Basis Pursuit Denoising (5a) optimization 
programs can be reformulated into equivalent linear and convex quadratic 
programs, respectively. This is achieved via the linearization of the non-smooth 
^i-norm in the objective function. 
Let us define 

\xi\=u.i+v u Vi=l, ...,n, (12) 

where u,; = max(xi,0) and Vi = max(— Xj,0). Then the linearization of the 
^i-norm is 

11x11! = 1>+1>, (13) 

with u, v > and l n € R n being a column vector of all ones. Once optimal 
values of variables u and v are found the solution x of the initial program is 
retrieved by computing 

x = u — v. 

The technique described above is used in the very successful GPSR algo- 
rithm |17j . The new equivalent BPDN program is 

min rlLz + MF T z - b\\l 

subject to: z > 0, 



s 



where 



v\ € 



p 2 1 1 



F T = [A - A] € 



pmx2n 



and b G 



The price 



for the linearization is that comparing to the initial BPDN program (5a) the 
dimension of the problem is doubled and 2n new non-negativity constraints 



are added. The same technique as in ( 13 1 can be applied for the linearization 
of the objective function in the BP program (J3j|. 



We solve the quadratic Basis Pursuit Denoising problem (14 1 using the 
primal-dual interior point method. The reader interested in the theory of IPMs 
is referred to the book of Wright 33 . Aspects of practical implementation have 
been addressed in a recent survey [19] . For the primal program ( 14 ) of interest 
the dual is 



Dual Sep.: 



max — \z T FF T z 
subject to: c + FF T z — s 
z,s > 0. 



(15) 



where c — 



e 



\l n -A T b 

Tl n + A T b 

At each step of the primal-dual interior point method 



applied to ( 14 ) 



( 15 1 the corresponding Newton direction (Az, As) is computed by solving the 
following system of linear equations: 



' FF T 


—Iln 


X 


Az 




u 


S 


z 




As 




fs 



(16) 



where S and Z are diagonal matrices with vectors s and z on the diagonal, 
respectively, li n denotes an identity matrix of dimension 2n and (/ z , f s ) is an 
appropriately computed right-hand side vector. 



In the matrix-free framework the dual variables As in (161 are eliminated 
to get: 

(0- l +FF T )Az = f z + Z- l f s , (17a) 



As = Z~ l f s - e- x Az. (17b) 



where & = S~ X Z e jj2nx2n^ j,^ e rec j ucec [ N ew ton system (17a|, also known 
as augmented system, is solved by an appropriate preconditioned iterative 
method for which only matrix-vector product with the constraint matrix F is 
allowed. Thus, the matrix-free IPM approach has two major components: 

— iterative solver for the augmented system; 

— special-purpose preconditioner that exploits matrix structure. 



The next section addresses these two issues. 
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4 Preconditioned Conjugate Gradient Method 



The system (17a) has a symmetric positive definite matrix and the conjugate 
gradient (CG) method can be employed to solve it in a matrix-free regime. 
However, the convergence of the CG method can be too slow when a matrix is 
ill-conditioned and/or its eigenvalues are not clustered. In this section we dis- 



cuss an efficient spectrally-equivalent diagonal matrix preconditioner for (17a I. 
In particular, we give theoretical and practical justification of our approach to 
fast iterative solution of the system. 



The proposed preconditioner for the system of equations (17a) is based on 
the exploitation of general properties of compressed sensing matrices and the 
behavior of the matrix in (17a) close to optimality. Let us recall that in the 



notation of primal-dual pair ( 14 (— ( 15 ), variable s € M is a Lagrange multi- 



plier associated with the non-negativity constraint z > 0. Hence, at optimality 
SjZj = Q Vj = 1, 2, . . . , 2n. IPMs force the convergence to the optimal solution 
by perturbing this condition SjZj = fi, Vj, where \i is the barrier term of the 
IPM, and gradually reducing the perturbation \i to zero. At optimality indices 
j G {1,2,..., 2n} are split into two disjoint sets: 

B = {j | zj -t z* > 0, Sj -t s* = 0} 

and (18) 
Af = {j\z j ^z*=0,s j ^s*>0} 

that determine the activity of constraints. This partitioning has highly unde- 
sirable consequences for the diagonal scaling matrix = S~ X Z . Indeed, when 
fi approaches zero, for indices j € B, 0j goes to infinity and for indices j G TV, 
0j goes to zero. 

Recall that z = [u ; v], where u and v are the positive and negative com- 



ponents of vector x (see ( 12 )), respectively. For sparse signals there are merely 
k (fc <C 2n) nonzero components in the optimal solution. The positive ones 
will contribute a nonzero element in u and the negative ones will contribute 
a nonzero element in v. At optimality the cardinality of set B is k. Hence, at 
later iterations of an IPM 

0i > 1 \fieB, cardS = k, 

(19) 

6> s ; < 1 VieAA, cardAA= 2n- k. 
Let us now return to the question of preconditioning of the system of 



equations (17a). Its matrix is 

H = 0- 1 +FF T . (20) 



The behavior of matrix near optimality is described by ( 19 ). It is clear that 
matrix (9 -1 has many large entries and only few small entries well before the 
IPM reaches the optimal solution. Let us introduce a number C ^> 1 that 
separates entries of 0~ 1 of different magnitudes: 

#(©- 1 < C) = I. (21) 
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Here I is just the number of small entries in (9 _1 and may be different from 
the sparsity k of the optimal solution. In the regime / < m, the second term 
FF T , whose rank is exactly m, works as a low-rank pertubation for the matrix 
H in (20 1. Since, in Frobenious norm the first term 0" 1 dominates the second 
term FF T , we propose to replace FF T in the preconditioner by a simple 
approximant. First, let us write system's matrix of (17a I in the block form by 
using the facts that — diag((9„, V ) and F T — [A — A]: 

A T A -A T A~] 
-A T A A T A 



H 



'On' 


+ 


0-\ 





(22) 



Our preconditioner is based on the approximation of A 1 A by the closest (in 
Frobenious norm) scaled identity matrix pl ni p — m/n: 



P : 



&u 1 + pin -Pin 
-pl n 0- 1 + pl n 



(23) 



To simplify the analysis of the preconditioner, we first consider the case of 
n x n matrices H and P rather than block 2n x 2n ones as defined by ( 22 ) and 
( 23 1 . The following lemma establishes spectral properties of the preconditioned 
matrix P~ 1 H in the non-block case. 

Lemma 1 Define matrix H as 

H = 0- 1 +A T A, 

with — diag(6*i, 6*2, ... , n ) — diagonal n x n matrix with 0j > 0, and A 
- m x n matrix (m < n/2) with orthonormal rows. Let A satisfy property 
P2 described on page^for k — I (I is defined in (21 )j with some constant Si. 
Then the eigenvalues of matrix H preconditioned by matrix P: 

P = 0~ x + pl n , p = m/n 

are clustered around 1, i.e. 



|A-1| <5 t + 



1 

pSiC' 



VA e spcc(P" 1 iJ). 



(24) 



Proof Let us define two disjoint sets of indices: 

Be = {je {1,2,. ..,n}: 0J 1 <C}, Af c = {l,2,. 



,n}\B c . 



Let B and N be matrices of columns of A with indices from Be and Afc, 
respectively. Without loss of generality we can assume that Be are the first I 
indices, then 

A=[BN], BeR mxl , NeR mx{n - l) . 

Let A be an eigenvalue of the preconditioned matrix P~ 1 H corresponding 
to an eigenvector v — [vb c ; %] of norm one, then 



P~ x Hv 



Xv (H - P)v = tPv, t = A - 1, 



(25) 
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or, in the block form. 



B T B - pit 


B T N 




v Bc 


= T 







N T B 


N T N-pI n -i_ 




VAf _ 










% 



(26) 

Obviously, eigenvalues of P 1 H are all real, hence t is also real. Multiplication 
of (26) by [v Bc ; v^f c ] T from the left gives 



B C (°Bc + P 1 *) V ®c + v Af c ( e N c + Ptn-l) «JV C 

<l G (B T B - pl t ) v Bc + v T Nc (N T N - pl n _t) VMo + 2v t Bc B t Nvm c ■ (27) 



Let us denote ||wb c ||| by a, then UfAfclli = 1 — a since v = [v Bc 
has unit norm. Bounding left hand side of (27) from below is trivial: 



V B C ( e Bc + Ph) V B C + V N C (°Mc + Ptn-l) % > l r l (p a + C(l - Of)). 

(28) 

Next, let us bound right hand side of (27) from above. For this purpose we 
decomposition of m; 

B = US B V^, S b 



will use the SVD decomposition of matrix B: 

"diag(cri,...,er ; ) 



Ixl 



and the fact that 



Im = AA T = BB T + NN T . 



The latter means that matrices BB T and iViV T share the same eigenbasis and 
that 

N = U SffV N , Sb^b + ZJn^n = Im- 
Hence, singular values of matrix N are known 

S N = [diag(- v /l - a\, y/1 - erf, . . . , l m )|Omx(n-m-o] ' 

Restricted isometry property P2 implies that 

al<p(l + 5i), af>p(l-5i). 

First, notice that 

v T Bc (B T B - ph) v Bc | < pSta. (29) 
Next, using SVD decomposition of N obtain 
\\N T N - pl n -t\\ 2 = max {p,l - p,l - p - a?} = 1 - p, (p = m/n < 0.5) 



and, hence, 



"Xb (N T N - pl n _t) v Ng J < (l - p) (l - a) . 



(30) 
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Finally, the SVD decomposition of B T N is 

B T N = V b S T b SnV^. 

Hence, 

\\B T N\\ 2 = ! max ( ^a^of) < ^ - a\) < 1/5 



because the maximum value of function f(x) = -^/x(l — x) on the interval [0, 1] 
is 1/2 and our assumptions m < n/2 and Si < 1 imply of < o\ < p(l + Si) < 1. 
We conclude that 

24 c B T iV Wc < y/ a (i - a). (31) 



Bounds (30) and (31) are sharp and can be used to obtain very tight 
estimate on r but we do not need them that sharp to obtain a sufficiently 
good estimate. So, we will release them a little bit to simplify the analysis: 



vjfo (N T N - p/ n _ ; ) v Nc < (1 - a) < Vl 
2vl c B T N VA f c < y/a(l-a) < y/l-a. 



Using (|28j) and (|29) and (|32j) we finally get 
pa + C (1 — a) 



(32) 



(33) 



Let us show that e is small for large values of C. Indeed ( 33 ) implies that 

2^1^ < 5i (c + Ce - pej (1 - a) + p^e. 

It can be checked by simple calculus, that yjx < C\x + Cy. on [0, 1] whenever 
Ci > In our case this implies 

Siic + Ce-pe) > (/*J,e)~\ 
The largest solution of the quadratic equation in e 

p6?e(c + Ce- pe) = 1 



C 



2 C-p 



pSfC 



C-P 
C 



1 < (pSfC 



Hence, it is sufficient to take any e > ( pSfC ) to satisfy the inequality (33 1 



M < 5 t + 



pSiG 



(34) 



This completes the proof. 
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For the result of the theorem to be useful we obviously need the bound in 



the right-hand side of inequality (34 1 to be sufficiently smaller than one. Let us 
take a closer look at the terms forming this bound. We are free to choose any 
value for the constant C we want, the larger the better. However, according 



to (21), I increases with the increase in C and, consequently, the restricted 



isometry constant 5i also increases. Inequality (34 1 holds for any value of C, 
hence we can replace it with 

|r| < mm (s t + ^) (35) 

and choose constant C that delivers the minimum. 

It is natural to assume the restricted isometry constant 5i to be less than 
1/2 (see Theorem [TJ. Number of measurements m is usually just a fraction p 
of the length n of the unknown signal, say p = 1/4. Hence, to have |t| < 5/6 
we need C = 24 i n (|2l"| ), which certainly holds near optimality in the IPM. 



The bound in (34 1 is rather pessimistic. Computational experience suggests 
that eigenvalues of the preconditioned matrix get well clustered around 1 as 
long as I — ^(OJ 1 < 1) is such that the RIP constant Si < 1. For example, for 
the discrete cosine (DCT) matrix with n = 2 10 and m — 2 8 the corresponding 
I < 74 (this number is obtained in a series of random tests). 

Now we are ready to state the spectral properties of the preconditioned 



matrix P 1 H for the system of equations (17a). We leave the theorem without 



a proof as it a straightforward corollary of Lemma [T] 



Theorem 3 Let H and P be block matrices defined in (22) and (23), respec- 
tively. Then the preconditioned matrix P~ X H has 

1. the eigenvalue 1 of multiplicity n; 

2. remaining n eigenvalues defined in Lemma [7] with = © u + V . 

Theorem [3] establishes the clustering of eigenvalues of P _1 iJ around 1. 
Hence, iterative method such as conjugate gradient applied to the system of 
equations ( |17a| is expected to converge in just a few iterations if the precon- 



ditioner P in (23) is used. The latter theoretical results are also confirmed in 
practical experiments. Figure [l] demonstrates clustering of eigenvalues X(H) 



and X(P~ 1 H) in the case that the A matrix in H (23) is a Discrete Cosine 
Transform (DCT) matrix with normalized rows, AA r ^= I. The parameters 
for the size of the problem are set to m — 2 10 , n — 2 12 and the sparsity level is 



fixed to k = 51. In the left sub-figure la the clustering of the eigenvalues X(H) 



is shown. Every vertical line presents the spreading of \(H) at a particular 
CG call as the matrix-free IPM progresses. One can observe that the cluster- 
ing worsens as the matrix-free IPM approaches optimality. On the contrary, 
eigenvalues of the preconditioned matrices P~ X PL show the opposite behavior. 
In particular, as the matrix-free IPM progresses eigenvalues X(P~ 1 H) start 
to cluster around one. The latter is depicted with the vertical columns in the 
right sub-figure [Tb"} 
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Spread of UU) per call of CG 




number of CG call 

(a) Unprcconditioned systems, H 



Spread of MP _1 H) per call of PCG 



III 



II 



[III 



number of PCG call 

(b) Preconditioned systems, P~ 1 H 



Fig. 1 Clustering of the eigenvalues for the matrices H and P~ H as the matrix-free IPM 
approaches optimality. The matrix A in H |2'2| is a DCT matrix with normalized rows. The 
parameters of the problem set to m = 2 lf ',n = 2 12 and k = 51. Twenty systems for the 
matrices H and P~ X H are solved in total 



5 Computational Experience 

We illustrate our developments by comparing the matrix-free IPM's efficiency 
with those of the state-of-the-art first order methods, FPC_AS and SPGL1 
and with two other interior point based solvers, the l\JL s and the PDCO. The 
experiments are made on the Sparco test suite [I]. 

We use the FPC_AS CG version of the FPC_AS algorithm, where "CG" 
stands for the conjugate gradient method. The FPC_AS CG has been shown in 
[31] to be considerably faster than other versions of the FPC and FPC_AS soft- 



ware packages. The FPC_AS CG solves program (5a). The code of the FPC_AS 
CG package can be found at http://www.caam.rice.edu/~optimization/ 
|L1/FPC_AS7| We use the SPGLl.bp version of the SPGL1 software pack- 
age for noiseless signals and the SPGLl_bpdn version for noisy signals, where 
"bp" stands for basis pursuit and "bpdn" for basis pursuit denoising, respec- 
tively. The SPGLl_bp solves program ([3]) and the "bpdn" version solves pro- 
gram (5c). The code of the SPGL1 package can be found at |http : / /www . 



cs . ubc . ca/labs/ scl/spgll . Those versions of the FPC_AS and SPGL1 soft- 



ware packages were found to be faster and more accurate than other first 



order methods mentioned in subsection 5.1 Therefore, GPSR and Nest A 



solvers are excluded from the comparison. The l\Jt s solver implements pro- 



gram (5a), it can be found at http : //www. Stanford. edu/~boyd/ll_ls/ The 
PDCO solver is used through the file SolveFasBP.m of the SparseLab soft- 
ware package. The PDCO solver can be found at http: //www. Stanford. 
edu/group/SOL/software/pdco.html and the SparseLab software package at 



http : //sparselab . Stanford . edu/ . 
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Table 2 Symbols and abbreviations used in tables and figures in the Benchmarks, Com- 
parison and Robustness to noise subsections. 



m,n,k 


number of rows and columns of the matrix A and the number of 




nonzero elements in the optimal sparsest signal representation. 


X 


optimal sparse representation. 


r.e 


relative error \\x\/y — x||2/||xwl|2t where x\y. = x^ if i S W otherwise 




x w . = and W := {i = 1, 2, ...n | &i ^ 0}. 


nMat 


total number of matrix vector products Ax and A J y. 



Table 3 Sparco collection 



Problem 


ID 


m,n 


Operator 


Pill 


1121. [T3l blocksig 


2 


1024, 1024 


wavelet 


4.5e+02 


[12], [13] blkheavi 


9 


128, 128 


hcaviside 


4.1e+01 


[12], [13] blknheavi 


10 


1024, 1024 


normal, heaviside 


9.8e+02 


17 blurrycam 


701 


65536, 65536 


blurring, wavelet 


1.0e+04 


fTT] blurspike 


702 


16384, 16384 


blurring 


3.4e+02 


cosspike 


3 


1024, 2048 


DCT 


2.2e+02 


[2T1 jitter 


902 


200, 1000 


DCT 


1.7e+00 


[7] sgnspike 


7 


600, 2560 


Gaussian ens. 


2.0e+01 


|16| spikctrn 


903 


1024,1024 


ID convolution 


1.3c+01 



In addition, two more experiments are performed. The first one tests the 
robustness of matrix-free IPM given a fixed level of noise on certain problems 
and the second one demonstrates that matrix-free IPM has optimal phase 
transition |14] properties. 

Before proceeding to the following subsections it would be convenient for 
the reader to be familiarized with symbols and abbreviations used in the sub- 
sequent figures and comparison tables explained in Table [2j 



5.1 Benchmarks 

In order to have a base of comparison we choose to show the efficiency of 
the matrix-free IPM on already existing benchmarks, which have been used 
by several researchers including [31] . [3] and [3]. Experiments are performed 
on nine real valued sparse reconstruction problems (Table [3| from the Sparco 
collection gj. For problems in Table [3] with ID's 701 and 702 the optimal 
representation x is not given by the Sparco toolbox. Therefore, the SPGLl_bp 
solver is used to obtain x with required high accuracy. Since some of the 
components of the obtained solution from the SPGLl_bp might not be exactly 
zero we consider as nonzero components the ones in the set nnz(x) :— {k = 
l,...,n I Yli=i — 0.999|jx|ji}, where x is the sorted by absolute value 
vector x. Then we set xwi = Xi if i £ W otherwise xw t = 0, where W := {i = 
1, 2, ...n I i S nnz(a;)}. 
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Noise is introduced to the noiseless measurements b using the following 
command in MATLAB: 

b = awgn(6. SNR, 'measured'), (36) 

The function awgn is a MATLAB function from Communications Systems 
Toolbox which adds white Gaussian noise to the signal b. The SNR is the 
signal to noise ratio, in dB. The 'measured' option specifies that the power of 
the signal is calculated first before the addition of the noise. 



5.2 Termination Criteria and Parameter Tuning 

We force the termination of the compared solvers when a solution of similar 
accuracy to the one of the matrix-free IPM is obtained. In order to do so, 
we incorporate two additional termination criteria. The solution process is 
terminated when at a current iteration the projected relative error, defined by 
r.e, is smaller than the projected relative error obtained by the matrix-free 
IPM. Occasionally, certain solvers required too many matrix-vector products 
without achieving a solution of the similar quality to the one delivered by 
the matrix-free IPM. In this case the solvers were terminated when nMat > 
40,000. 

Regarding the parameter tuning of the FPC_AS CG solver we set the 
parameter sub_mxiter equal to 10 and 80 for noiseless and noisy measurements, 
respectively. For the SPGLl_bpdn solver we set the upper bound 62 = ||e||2 in 



(5b), where e € E m is the known vector of noise added to the measurements. 



Any other parameters are set to their defaults values. For all the solvers that 



implement program (5a I the rcgularization parameter r is set as shown in 
Table H 



5.3 Comparison 

In this section we present the computational results obtained for the Sparco 
collection problems discussed in the Benchmarks section. Both noisy and noise- 
less measurements are considered. Noise is added to the measurements by fix- 
ing the SNR= 60 db. A comparison among the previously mentioned solvers is 
made in terms of the quality of reconstruction and computational effort. The 
results of experiments are shown in Table [4] The first column in Table [4] shows 
ID of the Sparco problem. For each ID first and second sub-rows give results 
for noisy and noiseless measurements, respectively. The values for the regu- 
larization parameter r are shown in the second column of Table [4] The third 
column reports the relative error that was achieved in the matrix-free IPM. 
The rest of the table shows the number of matrix-vector products, nMat, that 
were needed by each solver to reconstruct a solution of similar quality to the 
one of the matrix- free IPM. In cases when number of matrix- vector products 
required by a solver exceeded 40, 000, the solver was terminated with a failure 
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Table 4 Results for noiseless and noisy Sparco problems 











mfipm 




PDCO 


FPC_AS CG 


SPGL1 


ID 






r.c 






nMat 








1 


ue-uo 


o.ue-U4 


ox 


A Q 

To 


Do 1 


o 
y 


4UUUU 


2 


1 

1 


rin i n 
ue-iu 


i.ue- 1 1 


DO 


no 
00 


4UUU l 


4UUUZ 


ZZ 


3 


1 

1 


fin m 

ue-Uo 


i .ue-U4 


A 1 
Z41 


4DZ 


A O/l 1 

4y4i 


1U0 


4UUUU 




1 

1 


fin fl7 


1. ue-Uo 


-1 "1 f, 
410 


1D1Z 


4U1D i 


OT O 
Z1Z 


"1 A Q 

I'lO 


5 


1 

1 


Ue-Uo 


Z. Ue-Uo 


oyyi 


yo4z 


Ze5ZUo 


oZl 


4UUUU 




1 

1 


Ue-U4 


Z. Ue-Uo 


/ yoo 


1 OfiQ /I 


41Zoo 


o ( 4 


Zoo l 


7 


1 

1 


UC-Uo 


4. Ue-Uo 


"1 TO 


Z I z 


-izo 


UZ 


oy 




1 

1 


fin fl7 

ue-u / 


l.ue-UD 


Zoo 


ooU 


OU1 


/ D 


ol 




1 


Oe-03 


1. Oe-03 


689 


1546 


7065 


1680 


40000 


9 


1 


0e-10 


5.0e-12 


649 


1886 


6845 


40016 


40000 


10 


1 


Oe-03 


1.0e-03 


4775 


8529 


6203 


40002 


40000 


1 


0e-10 


9.0e-10 


4567 


8192 


41227 


40161 


40000 


701 


1 


Oe-08 


2.0e-02 


947 


1794 


5967 


1049 


40000 


1 


0e-10 


7.0e-09 


1341 


2656 


42041 


40017 


15239 


702 


1 


Oe-08 


4.0e-03 


809 


1574 


3341 


40001 


40000 


1 


0e-10 


1.0e-07 


1123 


3030 


49563 


40157 


11089 


902 


1 


Oe-05 


3.0e-04 


181 


588 


261 


40 


40000 


1 


Oe-07 


2.0e-06 


225 


675 


275 


42 


59 


903 


1 


0e-01 


7.0e-03 


2201 


4944 


5939 


11395 


8941 


1 


Oe-05 


3.0e-06 


4173 


25114 


29901 


40161 


40000 



status. To be precise, it is a failure to converge to a solution similar to the 
one obtained with matrix-free IPM. Problems for which the matrix-free IPM 
needed fewer matrix-vector products are denoted in bold. One can observe in 
Table [4] that the matrix-free IPM was the fastest solver in 10 out of 20 noisy 
and noiseless problems. 



5.4 Robustness to noise 

In this subsection we test the matrix- free IPM solver for different levels of noise 
and show that it is able to reconstruct the optimal representation regardless 
of the level of noise. We measure the quality of reconstruction by using the 
following criterion: 



\X W - X\\ 2 

AMP2 = v — t — , (37) 

which is relative to the level of noise. The level of noise varies from SNR = 10 
to SNR = 120 with a step of 10. Since the minimum AMP2 is unknown, 
it is obtained by using the FPC_AS CG solver with required high accuracy. 
The obtained AMP2 will be considered as optimal and we will show that the 
matrix-free IPM achieves the same level of accuracy. Additionally, the FPC_AS 



CG solver implements the formulation ( 5a ) which depends on the pre-defined 



regularization parameter r, hence, the optimal solution also depends on r. For 
the latter reason different results for different r parameters are demonstrated 
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Robusteness in Noise: Sparco problem 2 



Robusteness in Noise: Sparco problem 9 




+ Optimal 

O Optimal st aled down : by 1000 
3fc Non optimal : scaled up T by 100 

Non optimal scaled up ; by 1000 
□ mfipm 



log(SNR) (SNR in dB) 

(a) Results for the Sparco problem 2 



* * 
* • 

A » 



+ Optimal 

O Opl mal scaeo down r by 1000 
♦ Non optimal : scaled up r by 100 
Non optimal, scaled up t by 1000 
□ mfipm 



log(SNR) (SNR in dB) 

(b) Results for the Sparco problem 9 



Fig. 2 Optimal curves of the AMP2 as a function of the SNR for the Sparco problems 2 
and 9. The curve of crosses shows the optimal AMP2 for certain values of f . The curve of 
circles shows the same optimal AMP2 curve but for values of f divided by 1000. The curves 
of stars and triangles show the non-optimal AMP2 for values of f multiplied by 100 and 
1000, respectively. The curve of squares shows the obtained AMP2 from the matrix-free IPM 
solver. Occasionally, all five symbols overlap because they correspond to optimal values of 
AMP2 



in order to show that the reconstructed solutions from FPC_AS CG are indeed 
the best possible given a particular level of noise. The termination criteria for 
the FPCLAS CG are set to gtol = l.Oe - 10 and mxitr = 2000, any other 
parameters are set to their default values. For each level of SNR, 50 vectors e 
are randomly generated and added to the noiseless measurements. The average 
results are reported. We apply the above methodology to the Sparco problems 
with IDs 2 and 9. The results are demonstrated in Figure [2] 

In Figure [2] we show the level of noise, SNR, against the criterion AMP2. 
The values of f which correspond to the optimal AMP2 are found experimen- 
tally. The results for the optimal values of f are denoted by the cross symbols. 
If the optimal f values are divided by a factor of 1000 then the AMP2 remains 
the same, the latter results are denoted with the circle symbols. Moreover, if 
the optimal values of f are multiplied by the factors of 100 and 1000 then the 
AMP2 criterion worsens. The latter results are denoted by the star and trian- 
gle symbols for the factors of 100 and 1000, respectively. Finally, the square 
symbols show the AMP2 obtained using the matrix-free IPM solver. In both 
experiments the matrix-free IPM reconstructed solutions which correspond to 



the optimal AMP2, as shown in Figures 2a and 2b 



5.5 Optimal Phase Transition 



Recently, it has been shown in [14] that for any problem instance (A,b) there 
is a maximum ratio v p = k/m given p = m/n that if exceeded then the formu- 
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Phase transition for matrix-free IPM 




Fig. 3 Empirical phase transition for matrix-free IPM. The solid curve denotes the theo- 
retically optimal phase transition. The dashed curve denotes the empirical phase transition 
for 50% success rate of matrix-free IPM 



rations ^ or (5a I will fail to reconstruct the optimal sparse representation, 
while the reconstruction is guaranteed for v p <v p . The latter, has been intro- 
duced as the notion of phase transition. Ideally, an efficient £i-regularization 
solver should have empirical phase transition at the same level i7 p . 

In this section we show that the matrix-free IPM has optimal phase tran- 
sition properties by reproducing a similar experiment as in section two of 
[14] . Let us now briefly explain the experiment. The parameter n is hxed to 
n = 1000. The measurements m are varied from m = 100 to m = 900 with a 
step of 100. For each of the nine measurements m the sparsity of the optimal 
representation is varied from k = 1 to k = m with a step of one and for each k 
100 trials are conducted. The censing matrix A is chosen by taking randomly m 
rows from anxn normalized discrete cosine transform matrix. Each nonzero 
coefficient of the sparse representation is set to ±1 with equal probability, 
while the sparsity pattern is chosen at random. All the generated problems 
are solved using the matrix-free IPM solver, the reconstruction is considered 
successful when r.e < 1.0e-5. For each ratio v p we compute the success ratio 
p(v p ) = SyiOO, where S is the number of trials for which the r.e < 1.0e-5. It 
has been demonstrated empiricaly in (14j that for any problem instance (A,b) a 
solver with optimal reconstruction properties has max{v p | p(v p ) > 0.5} ~ i7 p . 
The latter means that the empirical phase transition for 50% success rate over- 
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laps with the theoretically optimal phase transition. Therefore, we only plot 
the empirical phase transition for 50% success rate of the matrix-free IPM and 
the optimal phase transition, which they overlap in the particular experiment. 
The results are presented in Figure [3j 

6 Conclusions 

We propose and implement a computationally inexpensive matrix-free IPM, 
based on |20| . for the l\ -regularized programs arising in the held of Com- 
pressed Sensing. Since CS problems result in ill-conditioned systems at each 
iteration of an IPM, we propose a low cost preconditioner for the conjugate 
gradient method. The proposed preconditioning technique exploits features of 
Compressed Sensing matrices as well as Interior Point Methods. Its efficiency 
is justified theoretically and confirmed in numerical experiments. 

Our computational experience shows that although the CS research com- 
munity seems to favor first-order methods, a specialized (matrix-free) interior 
point method is very competitive and offers a viable alternative. 
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